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10 The present invention is directed to a method for distinguishiag CBF (core binding factor)- 
positive AML subtypes, preferably AML_t(8;21) and/or AML_inv(16), ftom CBF- 
negative AML subtypes, preferably fix>m AML_mv(3), AML_t(15;17), 
AML_t(l Iq23)/MLL (AML_MLL), and/or AMLJcomplext (complex abenant karyotype) 
by detaamning the expression level of selected maricer genes. 

15 

Leukemias are classified into four different groups or types: acute myeloid (AML), acute 
lymphatic (ALL), chronic myeloid (CML) and chronic lymphatic leukemia (CLL). Within 
these groups, several subcategories can be identified further using a panel of standard 
techniques as described below. These different subcategories m leukemias are associated 

20 with varying clinical outcome and therefore are the basis for different tteatment strategies. 
The unportance of highly specific classification may be iUustrated in detail fiirther for the 
AML as a very heterogeneous groiq? of diseases. Effort is aimed at identifiong biological 
entities and to distmguish and classify subgroups of AML whidi are associated with a 
favorable, intermediate or unfevorable prognoas, respectively. In 1976, the FAB 

25 classification was proposed by tiie French-American-British co-operative group which was 
based on cytomorphology and cytochemistry m order to separate AML subgroups 
accordmg to the morphological appearance of blasts m the blood and bone marrow. In 
addition, it was recognized that genetic abnormalities occurring in the leukemic blast had a 
major impact on tiie morphological picture and even more on flie prognosis. So far, tiie 

30 karyotype of the leukemic blasts is tiie most important independent prognostic fector 
regarding response to therapy as well as survival. 
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UsuaUy, a combination of methods is necessary to obtain the most important information 
in leukemia diagnostics: Analysis of the morphology and cytochemistry of bone marrow 
blasts and peripheral blood ceMs is necessary to estabKsh the diagnosis. In some cases the 
addition of immunophenotyping is mandatory to separate very imdififerentiated AML ftom 
acute lymphoblastic leukemia and CLL. Leukemia subtypes investigated can be diagnosed 
by cytomorphology alone, only if an expert reviews the smears. However, a genetic 
analysis based on chromosome analysis, fluorescence in situ hybridization or RT-PCR and 
immunophenotyping is required in order to assign all cases into the right.category. The aim 
of these techniques besides diagnosis is mainly to determine the progbosis of tiie teukemia. 
A major disadvantage of these methods, however, is that viable cells are necessary as the 
ceUs for genetic analysis have to divide in vitro in order to obtain metaphases for the 
analysis. Another problem is the long time of 72 hours from receipt of tiie material m tiie 
laboratory to obtain fb^ result Furtiiermore. great experience in preparation of 
chromosomes and even more m analyzing tiie karyotypes is required to obtain tiie correct 
result in at least 90% of cases. Using these techniques in combination, hematological 
malignancies in a first approach are separated into chronic, myeloid leukemia (CML). 
chronic lymphatic (CLL). acute lymphoblastic (ALL), and acute myeloid leukemia (AML)! 
Witiiin the latter tiuee disease entities several prognosticaUy relevant subtypes have been 
established. As a second approach tiiis further sub-classification is based mamly on genetic 
abnormalities of tiie leukemic blasts and clearly is associated wifli different prognoses. 

The sub-classification of leiikemias becomes mcreasingly important to guide tiierapy. The 
development of new. specific drags and treatment approaches requires tiie identification of 
specific subtypes tiiat may benefit fiom a distinct ttierapeutic protocol and, tiius, can 
improve outcome of distinct subsets of leukemia. For example, tiie new tiier^)eutic drug 
(STI571, Imatinib) inhibits tiie CML specific chuneric tyrosme kinase- BCR-ABL 
generated from tiie genetic defect obser/ed in CML. tiie BCR-ABL-reanangement due to 
tiie translocation between chromosomes 9 and 22 (t(9;22) (q34; qll)). In patients treated 
witii tiiis new drug, tiie tiierapy response is dramatically higher as compared to aU otiier 
drugs tiiat had been used so far. Anotiier example is tiie subtype of acute myeloid leukemia 
AML M3 and its variant M3v both witii karyotype t(15;17)(q22; qll-12). The introduction 
of a new drug (all-trans retinoic acid -,ATRA) has unproved tiie outcome in tiiis subgroup 
of patient fiom about 50% to 85 % long-term survivors. As it is mandatory for tiiese 
patients suffering fiom tiiese specific leukemia subtypes to be identified as fast as possible 
so tiiat tiie best tiierapy can be" applied, diagnostics today must accomplish sub- 
classification witii maximal precision. Not only for tiiese subtypes but also for several 
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Other leukemia subtypes different treatment approaches could improve outcome. 
Therefore, rapid and precise identification of distinct leukemia subtypes is the future goal 
for diagnostics. 

Thus, the technical problem underlying the present invention was to provide means for 
leukemia diagnostics which overcome at least some of the disadvantages of the prior art 
diagnostic methods, m particular encompassing the time-consummg and. unreliable 
combination of different methods and which provides a rapid assay to unambiguously 
distinguish one AML subtype ftom another, e.g. by genetic analysis. 

According to Golub et al. (Science, 1999, 286, 531-7), gene expression profiles can be 
used for class prediction and discriminatmg AML firom ALL samples. Howev«, for the 
analysis of acute leukemias the selection of the two different subgroups was performed 
using exclusively morphologic-phenotypical criteria. This was only descriptive and does 
not provide deeper insights into the pathogenesis or the underlying biology of the 
leukemia. The approach reproduces only very basic knowledge of cytomorphology and 
intends to differentiate classes. The data is not sufficient to predict prognosticaUy relevant 
<^rtogenetic aberrations.^ 



Furthermore, the international application WO-A 03/039443 discloses marker genes the 
expression levels of which are characteristic for certain leukemia, e.g. AML subtypes and 
additionally discloses methods for differentiating between the subtype of AML cells by 
determining the expression profile of the disclosed marker genes. However, WO-A 
03/039443 does not provide guidance which set of distinct genes discrimmate between two 
25 subtypes and, as such, can be routineously taken in order tb distinguish one AML subtype 
from anotiier. 



The problem is solved by tiie present invention, wbich provides a melliod for 
distinguishing CBF-positive AML subtypes preferably AML_t(8;21) and/or AML_inv(16), 
30 fiom CBF-negative AML subtypes, preferably &om AML_inv(3), AML_t(15;17)' 
AML_t(llq23)/MLL. and/or AML.komplext, in a sample, the metiiod comprising 
determining die expression level of maricers selected from the markers identifiable by tfieir 
Afifymebix Identification Numbers (affy id) as defined in Tables 1, and/or 2, 



wherein 
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a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 1.1 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
5 numbers 1 to 50 of Table 1.1 having a positive fc value, 

is indicative for the presence of AMLjCBF vwhen AML_CBF is distinguished 
fiom all otiier subtypes, 

and/or herein 

a lower expression of at least one polynucleotide defined by at least one of the 
10 numbers 1 to 50 of Table 1.2 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 1 .2 having a positive fc value, 

is indicative for the presence of AMOL.MLL whai AMLJ^L is distinguished 
fix>m all other subtypes, 

IS and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of flie 
numbers 1 to 50 of Table 1.3 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of die 
numbers 1 to 50 of Table IJ.having a positive fc value, 

20 is indicative for die presence of AML_inv(3) when AMLJnv(3) is 

distinguished fmm all other subtypes, 

and/or wherein 

a lower expre^ion of at least one polynucleotide defined by at least one of tiie 
numbers 1 to 50 of Table 1.4 having a negative fc value, and/or 

^ * higher expression of at least one polynucleotide defined by at least one of flie 

numbers 1 to 50 of Table 1.4 haviiig a positive fc value, 

is indicative for tiie presence of AMLJcomplext when AMLJcomplext is 
distinguished from all other subtypes, 

and/or w^orein 



a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 1.5 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 1.5 having a positive fc value, 

is indicative for the presence of AML_t(15;l 7) when AML_t(15;17) is 
distinguished firom all other sub^es, 

and/or wherein 

a lower ejcpression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.1 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.1 having a positive fc value, 

is mdicative for the presence of AML_CBF when AML.CBF is distinguished 
fix>mAML_MLL, 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.2 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.2 having a positive fc value, 

is indicative for the presence of AML_CBF when AML_CBF is distinguished 
fiom AML_mv(3), 

and/or vdierein 

a lower expression oif at least one polynucleotide defined by at least one oftbs 
numbers 1 to 50 of Table 2.3 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.3 having a positive fc value, 

is indicative for the presence of AMLCBF when AML_CBF is distinguished 
&om AMLJcomplfflct, 

and/or v^erein 

. a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.4 having a negative fc value, and/or 
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a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.4 havmg a positive fc value, 

is indicative for the presence of AML_CBF when AML_CBF is distinguished 
fromAML_t(15;17), 

5 and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.5 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.5 having a positive fc value, 

10 is indicative for the presence of AML_MLL when AML_MLL is distinguished 

fiDmAML_mv(3), 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of tihe 
numbers 1 to 50 of Table 2.6 having a negative fc value, and/or 

* expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.6 iiaving a positive fc value, 

is uidicative ifor the presence of AMLJ^1LL when AMLJ^L is distingui^^^ 
fi?om AMLJcomplext, 

and/or wherein 

* ^^"^^ expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.7 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 5Q of Table 2.7 having a positive fc value, 

is indicative for the presence of AMLJS4LL when AML_MLL is distinguished 
25 fromAML_t(15;17), 

and/or wherein 

a lower ejqpression of at 1^ one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.8 having a negative fc value, and/or 

a higher expression of at lea^ one polynucleotide defined by at least one of the 
30 numbers 1 to 50 of Table 2.8 having a positive fc value. 



-7- 



5 



is indicative for the presence of AMLJnv(3) when AML_inv(3) is 
distinguished ftom AMLJkomplext, 

and/or v^ierein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.9 having a negative fc value, and/or 

a[ higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.9 haymg a positive fc value, 

is indicative fijr the presence of AML.inv(3) when AML_inv(3) is 
distinguished from AMLj^l5;17), 
10 and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of tiie 
numbers 1 to 50 of Table 2.10 havmg a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of tiie 
numbers 1 to 50 of Table 2.10 having a positive fc value. 



15 



is indicative for tiie presence of AMLkon5>lext when AMLJcomplext is 
distinguished firom AMLjt(15;17). 



As used herein, the following definitions ^ply to the above ebbreviations: 
CBF (core bmding fector) 
20 AML_t(8;21): AML witii t(8;21) teanslocation 
. AML_inv(16): AML witii inversion (16) 
AMLJnv(3): AML with inversion (3). 
AML_t(15;lT): AML witii t(15;17) translocation 

AML_t(l Iq23)/MLL (AML.MLL): AML witii translocation t(l lq23) in the mixed Imeage 
25 leukemia gene (MLL) 

AMLJcomplext AML wifli complex aberrant karyotype 

As used herein, "aU oflier subtypes" refer to tiie sub^es of flie present invention, i.e. if 
one subtype is distinguished from "all other subtypes'*, it is distiguished from aU otiier 
30 subtypes contained in tiie present invention. 
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According to the present invention, a ''sample** means any biological material containing 
genetic information in the fonn of nucleic acids or proteins obtainable or obtained from an 
individual. The sample includes c.g, tissue samples, cell samples, bone marrow and/or 
5 body fluids such as blood, saliva, semen. Preferably, the sample is blood or bone marrow, 
more preferably the sample is bone marrow. The person skilled in the art is aware of 
methods, how to isolate nucleic acids and proteins from a sample. A general method for 
isolating and preparing nucleic acids from a sample is outlined in Example 3* - 

10 According to the present invention, the term ''lower expression** is generally assigned to all 
by numbers and Affymetrix Id. definable polynucleotides the t-values and iFold change (fc) 
values of which are negative, as indicated in the Tables. Accordingly, the term "higher 
expression" is generally assigned to all by numbers . and Afigmietrix Id. definable 
polynucleotides the t-values and fold change (fc) values of which are positive. 

15 ' ■ 

According to the present invention, the term "expression'* refers to the process by which 
mRNA or a polypeptide is produced based on the nucleic acid sequence of a gene, i.e. 
„e3q>ression'' also includes the formation of mRNA tq[>on transcription. In accordance with 
the present iiivention, the term ^determining die expression level" preferably refers to the 
20 determination of the level of expression, namely of the markers. 

Generally, "marker" refers to any genetically controlled difference which can be used in 
the genetic analysis of a test versus a control sample, for the purpose of assigning the 
sample to a defined genotype or phenotype. As used herein, "markers" refer to genes 

25 which are differentially expressed in, e.g., difTerent AML subtypes. The markers can be 
defined by their gene symbol name, -their . ericoded protein name, their transcript 
identification number (cluster identification number), the data base accession number, 
public accession immber or G^iBjank identifier or, as done in the present invention, 
' Afifymetrix identification mmiber, chromosomal location, UniGene accession number and 

30 cluster type, LocusLink accession number (see Examples and Tables). 

The Affymetrix identification number (afify id) is accessible for anyone and title person 
skilled in the art by entering the "gene expression onmibus" internet page of the National 
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Center for Biotechnology Infoimation (NCBI) (http://www.ncbijdinjuh.gov/geo/). In 
particular, the affy id's of the polynucleotides used for the method of the present invention 
are derived from the so-caUed U133 chip. The sequence data of each identification number 
can be viewed at http://www.ncbi.nImjiih.gov/geo/query/acc.cgi7accK3PL96 

5 

GeneraUy, the expression level of a marker is determined by the determining the 
expression of its corresponding "polynucleotide" as described hereinafter. 

According to the present invention, flie term ..polynucleotide" refers, generaUy, to a DNA, 
10 in particular cDNA^ or RNA, in particular a cRNA, or a portion thereof or a polypeptide or 
a portion thereof, hi the case of RNA (or cDNA), the polynucleotide is formed upon 
transcription of a nucleotide sequence which is capable of expression. The polynucleotide 
fragments refer to fragments preferably of between at least 8, such as 10, 12, 15 or 18 
nucleotides and at least 50, such as 60, 80, 100, 200 or 300 nucleotides in length, or a 
15 complementary sequence thereto, representing a consecutive stretch of nucleotides of a 
gene, cDNA or mRNA. hi o&er terms, polynucleotides mclude also any fragment (or 
complemratary sequence thereto) of a sequence derived from any of the markers defined 
above as long as these fiagmoits unambiguously identify the marker. 

20 The determination of the expression level may be effected at the transcriptional or 
tianslational level, Le. at the level of mRNA or at the protein level. Protem fragments such 
as peptides or polypeptides advantalgeously comprise between at least 6 and at least 25, 
such as 30, 40, 80, 100 or 200 consecutive amino acids representative of the corresponding 
full length protein. Six amino acids are generally recognized as the lowest peptidic stretch 

25 giving rise to a linear epitope recognized by an antibody, fragment or derivative thereof. 
Al^natively, the proteins or fragments thereof may be analysed using nucleic acid 
molecules specifically binding to three-dimensional structures ({qstamm). 

Dependmg on the nature of the polynucleotide or polypeptide, the determmation of the 
30 expression levels may be effected by a variety of methods. For determining and detecting 
the eiqiression level, it is preferred m the present invoition tiiat the polynucleotide, in 
particular the cRNA. is labelled. 



-10- 



The labelling of the polynucleotide or a polypeptide can occur by a variety of metiiods 
known to flie. skiUed artisan. The label can be fluorescent, chemiluminescent, 
bioluminescent, radioactive (such as or ^P). TTie labeUing compound can be any 
labelling compound being suitable for the labeUing of polynucleotides and/or polypeptides. 
Examples include fluorescent dyes, such as fluorescein, ' dichlorofluorescein, 
hexachlorofluorescein, BODIPY variants, ROX, tetramethylrhodamin, rhodamin X, 
Cyanine-2, Cyanine-3, Cyanme-S, Cyanine-7, IRD40, FluorX, Oregon Green, Alexa 
variants (available e.g. from Molecular Probes or Amersham Biosciences) and tiie like, 
biotin or biotinylated nucleotides, digoxigenin, 'radioisotopes, antibodies, enzymes and 
receptors. Depending on die type of labelUng, the detection is done via fluorescence 
measurements, conjugation to streptavidin and/or avidin. antigen-antibody- and/or 
antibody-antibody-interactions, radioactivity measurements, as well as catalytic and/or 
receptor/ligand interactions. Suitable metiiods include tiie direct labelling (incoipoiation) 
metiiod. die amino-modified (amino^yl) nucleotide metiiod (available e.g. fiom 
Ambion), and tiie primw tagging metiiod (DNA deridrimer labelling, as kit available e.g. 
from Genisphere). Particularly preferred for tiie present invention is tiie use of biotin or 
biotinylated nucleotides for labelling, witii tiie latter being direcfly incoiporated into, e.g. 
the cRNA polynucleotide by in vitro transcription. 

If flie polynucleotide is mRNA, cDNA may be prepared into which a detectable label, as 
exemplified above, is incorporated. Said detectably labeUed cDNA, in smgle-stranded 
form, may tiien be hybridised, preferably under stringent or highly shingenl conditions to a 
panel of single-stianded oUgonucleotides representing different genes and affixed to a solid 
support such as a chip. Upon applying appropriate washing steps, tiiose cDNAs wiU be 
detected or quantitatively detected tiiat have a counteipart in tiie oligonucleotide panel. 
Various advantagedus embodiments of this general metiiod are feasible. For example, tiie 
mRNA or. tiie cDNA may be aiiiplified e.g. by polymerase chain reaction, wherein it is 
preferable, for quantitative assessments, tiiat flie number of amplified copies corresponds 
relative to further amplified mRNAs or cDNAs to tiie number of mRNAs originaUy 
■present m flie ceU. In a preferred embodunent of flie present mvention, tiie cDNAs are 
teanscribed into cRNAs prior to tiie hybridisation step wherein only in tiie transcription 
step a label is incorporate mto tiie nucleic acid and wherein flie cKNA is employed for 
hybridisation. Alternatively, tiie label may be attached subsequent to tiie tianscription step. 



Similarly, proteins from a cell or tissue under investigation may be contacted with a panel 
of aptamers or of antibodies or fragments or derivatives thereof. The antibodies etc. may be 
affixed to a solid support such as a chip. Binding of proteins indicative of an AML subtype 
may be verified by binding to a detectably labelled secondary antibody or aptamer. For the 

5 labelling of antibodies, it is referred to Harlow and Lane, /'Antibodies, a laboratory 
manual'*, CSH Press, 1988, Cold Spring Harbor. Specifically, a niinimum set of proteins 
necessary for diagnosis of all AML subtypes may be selected for creation of a protein array 
system to make diagnosis on a protein lysate of a diagnostic bone marrow sample directly. 
Protein Array Systems for the detection of specific protein expression profiles already are 

10 available (for example: Bio-Plex, BIORAD, Munchen, Germany). For this application 
preferably antibodies against the proteins have to be produced and inunobilized on a 
platform e.g. glasslides or microtiteiplates. The immobilized antibodies can be labelled 
wifli a leactant specific for the certain target proteins as discussed above. The reactants can 
include enzyme substrates, DNA, receptors, antigens or antibodies to create for example a 

15 capture sandwich inmiunoassay. 

For reliably distinguishing AML subtypes it is useful that the expression of more than one 
of the above defined markers is detemuned As a criterion for the choice of markers, the 
statistical significance of markers as expressed in q or p values based on the concept of the 
20 false discovery rate is determined. In doing so, a measure of statistical significance called 
the q value is associated with each tested feature. The q value is similar to the p value, 
except it is a measure of significance in terms of the false discovery rate rather than the 
false positive rate (Storey JD and Tibshirani R. Proc.Natl.Acad.ScL, 2003, Vol. 100:9440- 
5. 

25 . . . . • . • • ' 

In a preferred embodiment of the present invention, markers as defijied in Tables 1.1-2.10 
having a q-value of less iban 3E-06, more preferred less than 1.5E-09, most preferred less 
than 1.5E-1 1, less ttian 1.5&-20, less than 1.5E-30, are measured. 

30 Of the above defined markers, the expression level of at least two, preferably of at least 
ten, more preferably of at least 25, rnost preferably of 50 of at least one of the Tables of the 
markers is determined. 

In another preferred embodiment, the expression ievel of at least 2, of at least 5, of at least 
35 10 out of the markers havmg tiie numbers .1 - 10, 1«20, 1-40, 1-50 of at least one of the 
Tables are ineasured. 
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The level of the expression of the .^naiker", Le. the expression of the polynucldbtide is 
indicative of the AML subtype of a cell or an organism. The level of esqiression of a 
marker or group of markers is measured and is compared with the level of expression of 
5 the same marker or the same group of markers fiom other cells or sample^ The 
comparison may be effected in an actual experiment or in silico. When the expression level 
also referred to as e3q>ression pattern or expression signature (expression profile) is 
measurably different, there is. according to the invention a meaningful difference in the 
level of expression. Preferably the difference at least is 5 %, 10% or 20%, more preferred 
10 at least 50% or may even be as high as 75% or 100%: More preferred the difference in the 
level of expression is at least 200%. i.e. two fold, at least 500%, i.e. five fold, or at least 
1000%, Le. 10 fold. . . 

Accordingly, the expression level of markers ejjpressed lower in a first subtype than in at 
15 least one second subtype, which differs from the first subtype, is at least 5 %, 1 0% or 20%, 
more preferred at least 50% or may even be 75% or 100%, i.e. 2-fold lower, preferably at 
least 10-fold, more preferably at least 50-fold, and most preferably at least 100-fold lower 
in the first subtype. On the other hand, the expression level of markers expressed higher m 
a first subtype than in at least one second subtype, which differs 6om the first subtype, is 
20 at least 5 %, 10% or 20%, more prefened at least 50% or may even be 75% or 100%, i.e. 
2-fold higher, preferably at least 10-fold, more preferably at least 50-fold, and most 
preferably at least 100-fi>ld higher in the first subtype. 

In another embodiment of the presoit uwention, Ae sample is derived from an individual 
25 having leukaemia, preferably AML. 

For the meOiod of the present invention it is preferred if the polynucleotide the expression 
level of which is determined is in form of a transcribed polynucleotide. A particularly 
preferred transcribed polynucleotide is an mKNA, a cDNA and/or a cRNA, with Ae latter 

30 being preferred. Transcribed polynucleotides are isolated ' from a sample, reverse 
transcribed and/or amplified, and labelled, by employing methods well-known the person 
skilled in the art (see Example 3). In a preferred embodunent of the methods according to 
the invention, the step of determining the e3q)ression profile fVirtfaer comprises ampiifying 
the transcribed polynucleotide. 

35 • " ■ • ' 
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In order to detennine the ejcpression level of the transcribed polynucleotide by the method 
of the present invention, it is prefened that the method comprises hybridizing the 
transcribed polynucleotide to a complementary polynucleotide, or a portion thereof under 

stringent hybridization conditions, as desodbed hereinafter. • 

5 ' • •• . 

The term "hybridizing" means hybridization under conventional hybridization conditions, 
preferably under stringent conditions as described, for example, m Sambrook, J., et al., in 
"Molecular Clomng: A Laboratory Manual" (1989), Eds. J. Sambrook, E. F. Fritsch and T. 
Maniatis, Cold Spring Harbour Laboratory Press, Cold Spring Harbour, NY and the further 

10 definitions provided above. Also conternplated are pQjynucleotides that hybridize at lower 
stringency hybridization conditions.. Changes m the stringency of hybridization and signal 
detection are primarily accomplished through the manipulation, preferably of formamide 
concentration (lower percentages of fomMmide result m lowered stringency), salt 
conditions, or temperature. For example, lower stringency conditions include an overnight 

15 mcubation at 37*C in a solution coniprising 6X SSPE (20X SSPE = 3M NaCl; 0.2M 
NaH2P04; 0.02M EDTA, pH 7.4), 0.5% SDS, 30% formamide, 100 mg/nal sahnon sperm 
blocking DNA, foUowed by washes at SO^C with 1 X SSPE, 0.1% SDS. In addition, to 
achieve even lower stringency, washes performed following stringent hybridization can be 
done at higher salt concentrations (e.g. 5x SSC). Variations m the above conditions inay be 

20 accomplished through the inclusion and/or substitution of alternate blockmg reagents used 
to suppress background m hybridization experiments. The inclusion of specific . blocking 
reagents may require modification of the hybridization conditions described above, due to 
problems with compatibility* 

25 "Complementary" and "complementarity", respectively, can be described by the 
percental i.e. proportion, of nucleotides wfaidi can form base pairs between two 
polynucleotide strands or within a specific region or domain of the two strands. Generally, 
con^lemoitary nucleotides are, according to the base pairing rules, adenine and thymine 
(or adenine and uracil), and cytosme and guanine. Complementarity may be partial, in 

30 which only some of the nucleic acids', bases are. matched according to the base pairing 
rules. Or, there may be a complete or tot^ complementarity between the nucleic adds. The 
degree of complementarity between nucleic acid, strands has effects on the efficiency and 
strength of hybridization between nucleic acid strands. 

35 . Two nucleic acid strands are considered to be 100% complementary to each other over a 
defined length if in a defined region aU adenines of a first strand can pair wdth a thymine 
(or an uradl) of a second strand, aU guanines of a first strand can pair with a cytosine of a 
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second strand, aU thymine (or uracHs) of a first strand can pair with an adenine of a second 
stiand. and aU cytosines of a first strand can pair with a guanine of a second strand, and 
vice versa. Accordirig to the present invention, the degree of complementarity is 
determined over a stretch of 20, preferably 25, nucleotides, i.e. a 60% complementarity 
means that within a region of 20 nucleotides of two nucleic acid strands 12 nucleotides of 
the first strand can base pair with 12 nucleotides of tiie second strand according to the 
above rulmg, either as a stretch of 12 contiguous nucleotides or interspersed by non-pairing 
nucleotides, when tiie two strands are attached to each otiier over said region of 20 
nucleotides. The degree of complementarity can range from at least about 50% to full, i.e. 
100% complementarity. Two single nucleic acid strands are said to be "substantially 
complementary" when they are at least about 80% complementary, preferably about 90% 
or higher. For carrying out the method of tiie present mvention substantial 
complementarity is preferred. 

Preferred methods for detection and quantification of tiie amount of polynucleotides. i.e. 
for die metiiods according to tiie invaition allowing tiie determination of tiie level of 
expression of a marker, are fliose described by Sambtook et al. (1989) or real time metiiods 
known in die art as tiie TaqMan® mefliod disclosed in WO92/02638 and tiie conesponding 
U.S. 5,210,015. U.S. 5,804,375, U.S. 5,487,972. This, metiiod exploits tiie exonuclease 
activity of a polymerase to generate a signal, hi detaU. tiie (at least one) target nucleic acid 
component is detected by a process comprising contacting die sample witii an 
oligonucleotide containing a sequence complemehtaiy to a region of the target nucleic acid 
component and a labeled oUgonucleotide containing a sequence complementary to a 
second region of die same target nucleic acid component sequence strand, but not 
includmg flie nucleic acid sequence defined by tiie first oligonucleotide, to create a mixtiue 
of duplexes during hybridization conditions, wherein tiie duplexes comprise tiie target 
nucleic acid annealed to die first oUgonucleotide and to tiie labeled oKgonucleotide such 
tiiat tiie 3'-end of tiie fi^ oligonucleotide is adjacent to tiie S'-end of . tiie labeled, 
oligonucleotide. Then tiiis mixhire is tieated with a template-dependent nucleic acid 
polymerase having a 5' to 3' nuclease activity under conditions sufBcient to permit die 5' 
to 3' nuclease activity of tiie polymerase to cleave tiie annealed, labeled oligonucleotide 
and release labeled fi:agments. The signal generated by the hydrolysis of tiie labeled 
oligonucleotide is detected and/ or measured. TaqMan® technology eliminates tiie need for 
a solid phase bound reaction complex to be formed and made detectable. Otiier metiiods 
include e.g. fluorescence resonance energy transfer between two adjacentiy hybridized 
probes as used in tiie LightCyder® format desaibed in U.S. 6,174 670. • 
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A preferred protocol if the marker, i.e. flie polynucleotide, is in form of a transcribed 
nucleotide, is described in Example 3, where total RNA is isolated, cDNA and, 
subsequently, cRNA is synthesized iand biotin is incorporated during the transcription 
5 reaction. The purified cRNA is applied to commercially available arrays which can be 
obtained e.g. fix>m Af^etrix. The hybridized cRNA is detected according to the methods 
described in Example 3. The arrays are produced by photolithography or other mediods 
known to experts skilled in the art e.g. ftom U.S. 5,445,934, U.S. 5,744,305, U:S. 
5,700,637, U.S. 5,945,334 and EP 0 619 321 or EP 0 373 203, or as decribed hereinafter in 
10 greater detail. 

In anoflier anbodiment of the present mvention, the polynucleotide or at least one of the 
polynuclebtides is in form of a polypeptide. In another preferred embodiment, flie 
expression level of the polynucleotides or polypeptides is detected using a compound 
1 5 vMch specifically binds to the polynucleotide of tide polyp^tide of the present mventioiil 

As used herein, "specifically binding", means that the compomd is capable of 
discriminating between two or more polynucleotides or polypeptides, i.e. it binds to tiie 
desired polynucleotide or polypeptide, but essentially does not bind imspecifically to a 
20 dififerent polynucleotide or polypeptide. 

The conipound can be an antibody, or a firagment thereof an enzyme, a so-called small 
inolecule compound, . a protein-scaffold, preferably an anticalin. In a preferred 
embodiment, tf»e compound specifically binding t» the polynucleotide or polypeptide is an 
25 antibody, or a fiagmenttiieieof. 

As used herein, an "antibody" comprises monoclonal antibodies as first described by 
Kohler arid Milstein in Nature 278 (1975), 495-497 as well as polyclonal antibodies, i.e. 
antibodies contained in. a i>olyclonal antiserum. Monoclonal antibodies include fliose 

30 produced by transgenic mice. Fragments of antibodies include F(ab*)2, Fab and Fv 
fi:agmaDts. Derivatives of antibodies include scFi^, chimeric and humanized antibodies. 
See, for example Harlow and Lane, loc. cit For the detection of polypeptides using 
antibodies or fiagments tiiereo^ the person skilled in the art is aware of a variety of 
methods, all of vMch are included in the present invention. Examples include 

35 immunoprecipitation. Western blotting, En^nae-linked inununo sorbent assay (ELISA), 
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Enzyme-Unked immuno sorbent assay (RIA), dissociation-enhanced lanfhanide fluoro 
immuno assay (DELFIA), scintillation proximity assay (SPA). For detection, it is desirable 
if the antibody is labelled by one of the labelling compounds and methods described supra, 

5 In anotfa^ preferred embodiment of the present invention, the method for distinguishing 
CBF-positive AML subtypes fix)m CBF-negative AML subtypes is carried out on an anray. 

In general, an "array*' or "microarray** refeis to a linear or two- or three dimensional 
arrangement of preferably discrete nucleic acid or polypeptide probes vdiich comprises an 

10 intentionally created collection of nucleic acid or polypeptide probes of any length spotted 
. onto a substrate/solid support* The person skilled in the art knows a collection of nucleic 
acids or polypeptide spotted onto a substrate/solid siq>port also imder the term "array". As 
known to the person skilled in the art, a microarray usually refers to a miniaturised array 
arrangement, with the probes being attached to a density of at least about 10, 20, 50, 100 

15 nucleic acid molecules referring to different or the same genes per cm^. Furthermore, 
where appropriate an array can be referred to as "gene chip". The array itself can have 
different formats, e.g. libraries of soluble probes, or libraries , of probes tethered to resin 
beads, silica chips, or other solid supports. 

20 The process of array fabrication is well-known to the person skilled in the art. In the 
following, the process for preparing a nucleic acid array is described. Commonly, the 
process comprises preparing a glass (or other) slide (e.g. chemical treatment of the glass to 
enhance binding of the nucleic acid probes to the glass siurface),* obtaining DNA sequences 
representing genes of a genome of interest, and spotting sequences , these sequences of 

25 interest onto glass slide. Sequences of interest can be obtained via creating a cDNA library 
fix)m an mKNA source or by usmg publicly available databases, such as GeneBank, to 
annotate the sequence information of custom cDNA libraries or to identify cDNA clones 
from preAdously prepared libraries. Generally, it is recommendable to amplify obtained 
sequences by PGR iri order to have sufiBcient amounts of DNA to print on the array. The 

30 liquid containing the amplified probes can be deposited on the array by iising a set of 
microspotting pins. Ideally, the amount deposited shoidd be uniform. The process can. 
further include UV-crosslinking in order to enhance immobilization of tiie probes on the 
array. 

35 In a preferred embodiment, the array is a hi^ density oligonucleotide (oligo) array using a' 
light-directed chemical synthesis process, employing the so-called photolidiography 
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technology. Unlike common cDNA arrays, oligo arrays (according to the Affymetrix 
technology) use a single-dye technology. Given the sequence information of the markers, 
the sequence can be synthesized directly onto the array, thus, bypassing the need for 
physical intermediates, such as PGR products, required for making cDNA arrays. For this 

5 purpose, the marker, or partial sequences thereof, can be represented by 14 to 20 featares, 
preferably by less than 14 features, more preferably less than 10 features, even more 
preferably by 6 features or less, with each feature being a short sequence of nucleotides 
(oligonucleotide), which is a perfect match (PM) to a segment of the respective gene. The 
PM oligonucleotide are paired with mismatch (MM) oligonucleotides which have a single 

10 mismatch at the central base of the nucleotide and are used as "controls". The chip 
exposure sites are defined by masks and are deprotected.by the use of light, followed by a 
chemical coupling step resulting in the synthesis of one nucleotide. The masking, light 
deprotection, and coupling process can flien be repeated to synthesize the next nucleotide, 
until the nucleotide chain is of the specified lengdi. 

15 ' ' . * 

Advantageously, the method of the present invention is carried out in a robotics system 
including robotic plating and a robotic liquid transfer system, e.g. using microfluidics, i.e. 
chaimelled stmctured. 

20 A partictdar preferred method according to the present invention is as follows: 

1. Obtaining a sample, e.g. bone marrow or peripheral blood aliquots, from a patient 
having AML 

2. Extracting RNA, preferably mRNA, from the sample 

3. Reyerse transcribing the RNA into cDNA * 
25 4. In vitro transcribing the cDNA into cRNA 

5. Fragmenting the cRNA 

6. Hybridiziiig the fragmented cRNA oil standard iriicroar^^ 

7. Determining hybridization • . 

30 In another embodiment, the present invention is directed to the use' of at least one marker 
selected from the markers identifiable by their Affymetrix Identification Numbm (affy id) 
as defijied in Tables 1, and/or 2, for the manufacturing of a diagnostic for distinguishing 
CBF-positive AML subtypes from CBF-negative AML sxibtypes. The use of the present 
invention is particularly advantageous for distinguishing CBF-positive AML subtypes 

35 firom CBF-negative AML subtypes in an individual having AML. The use of said markers 
for diagnosis of CBF-positive AML subtypes from CBF-negative AML subtypes, 
preferably based on microarray technology, offers the following advantages: (1) more 
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rapid and more precise diagnosis, (2) easy to use in laboratories without specialized 
experience, (3) abolishes the requirement for analyzing viable cells for chromosome 
analysis (transport problem), and (4) very experienced hematoiogists for cytomorphology 
and cytochemistry, immunophenotyping as well as cytogerieticists and molecularbiologists 
5 are no longer required* 

Accordmgly, the present invention refers to a diagnostic kit containing at least one marker 
selected from the markers identifiable by their Affymetrix Identification Numbers (afify id) 
as defined in Tables 1, and/or 2, for distinguishing CBF-positive AML subtypes firom 

10 CBF-negative AML subtypes, in combination with suitable auxiliaries. Suitable auxiliaries, 
as used herein, include buffers^ enzymes, labelling compounds, and the like. In a preferred 
embodiment, the marker contained in the kit is a nucleic acid molecule which is capable of 
hybridizing to the mKNA corresponding to at least one marker of the present invention. 
Preferably, the at least one nucleic add molecule is attached to a solid support, e.g. a 

15 polystyrene microtiter dish, nitrocellulose membrane, glass surface or to non-immobilized 
particles in solution. 

In another preferred embodiment, the diagnostic kit contains at least one reference for a 
CBF-positilve AML subtype and/or a CBF-negative AML subtype. As used herein, the 
20 reference can be a sample or a data bank. 

In another embodiment, the present invention is directed to an apparatus for distingiiishing 
CBF-positive AML subtypes firom CBF-negative AML subtypes in a sample, containing a 
. reference data bank obtainable by comprising 
25 . (a) compiling a gene expression profile of a patient sample by determining Ae 

expression level at least one marker selected firom the markers identifiable by 
their Affymetrix Identification Numbers (afiEy id) as defined in Tables 1, and/or 
2, and 

(b) classifying the gene expression profile by means of a machine learning 
30 algorithm. 

According to the present invention, the "machine learning algorithm'' is a computational- 
based prediction methodology, also known to the person skilled in the art as "classifier**, 
employed for characterizing a gene expression profile. The signals corresponding to a 
35 certaia expression level which are obtained by the microarray hybridization are subjected 
to the algorithm in order to classify the expression profile. Supervised learning involves 
"training" a classifier to recognize the distinctions among classes and then "testing** the 
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accuracy of the classifier on an independent test set For new, unknown sample the 
classifier shall predict into which class the sample belongs. 

Preferably, the niachine learning algorithm is selected firom the group consisting of 
5 Weighted Voting, K-Nearest Neighbors, Decision Tree Induction, Support Vector 
Machines (SVM), and Feed-Forward Neural Networks. Most preferably, the machine 
learning algorithm is Support Vector Machine, such as polynomial kernel and Gaussian 
Radial Basis Fuiiction-kemel SVM models* 

10 The classification accuracy of a given gene list for a set of microarray experiments is 
preferably estimated using Support Vector Machines (SVM), because there is evidence that 
SVM-based prediction slightly outperforms other classification techniques like k-Nearest 
Neighbors (k-NN). The LIBSVM software package version 2.36 was used (SVM-type: C- 
SVC, Imear keniel (>ttp://www.csie.ntu.edu.twAMcjlin/libsvm/)). The skilled artisan is 

15 . fiirthermore referred to Brown et al., Proc.Natlj\cad.Sci., 2000; 97: 262-267, Furey et al., 
Biomformatics. 2000; 16: 906-914, and Vapnik V. Statistical Learning Theory. New York: 
Wiley, 1998. . . / 

In detail, the classification accuracy of a given gene list for a set of microarray experiments 
20 can be estimated using Support Vector Machines (SVM) as supervised learning technique. 
Generally, SVMs are trained using difiterentially expressed genes which were identified on 
a subset of the data and then this trained model is employed to assign new samples to those 
trained groups from a secorid and different data set. Differentially expressed genes were 
identified applying ANOVA and t-test-statistics (Welch t-test). Based on identified distinct 
25 gene expression signatures respective training sets consisting of 2/3 of cases' and test sets 
with 1/3 of cases to assess classification accuracies are designated. AssignmCTt of cases to 
training and test set is randomized and balanced by diagnosis. Based on the training set a 
Support Vector Machine (SVM) model is built 

30 According to the present invention, flie apparent accuracy, i.e. the overall rate of correct 
predictions of the complete data set was estimated by lOfold cross validation! This means 
that the data set was divided into 10 approximately equally sized subsets, an SVM-model 
vyas trained for 9 subsets and predictions were generated for the remaining subset This 
training and prediction process was repeated 10 timra to include predictions for each 

35 subset. Subsequentiy the data set was split into a training set, consisting of two thirds of the 
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samples, and a test set with the remaining one third. Apparent accuracy for the training set 
was estimated by lOfold cross validation (analogous to apparent accuracy for complete 
set). A SVM-model of the traimng set was built to predict diagnosis in the independent test 
set, thereby estimating true accuracy of the prediction model. This prediction approach was 
5 applied both for overall classification (multi-class) and binary classification (diagnosis X 
=> yes or no). For the latter^ sensitivity and specificity were calculated: 

Sensitivity = (number of positive samples predicted)/(mmiber of trae positives) * 

Specificity = (nimiber of negative samples predicted)/(nmnber of tme negatives) . 

10 In a preferred embodiment, the reference data bank is backed up on a computational data 
memory chip which can be inserted in as well as removed firom the apparatus of the present 
invention, e.g. like an interchangeable module, in order to use another data memory chip 
containing a different reference data bank. 

IS The apparatus of the present invention containing a desired reference data bank can be 
used in a way such that an unknown sample is, first, subjected to gene expression profiling, 
e.g. by microarray aiialysis in a manner as described supra or in the art, and the expression 
level data obtained by the analysis are, second, fed into the apparatus and compared with 
the data of the reference data bank obtainable by the above method. For this purpose, the 

20 apparatus suitably contains a device for entering the expression level of the data, for 
example a control panel such as. a keyboard. The results, whether and how the data of the 
unknown sample fit into the reference data bank can be made visible on a provided 
monitor or display screen and, if desired, pririted out on an incorporated of cormected 
printer. 

Alternatively, the apparatus of the present invention is equipped with particular appliances 
suitable for detecting and measuring the expression profile data and, subsequently, 
proceeding with the comparison witii the reference data bank. In this embodiment, the 
apparatus of the present iavention can contain a gripper arm and/or a tray which takes up 
30 the ixucroarraycontairiing the hybridized nucleic acids^ 

In another embodiment, the present invention refers to a reference data bank for 
distinguishing CBF-positive AML subtypes firom CBF-negative AML subtypes in a sample 
. obtainable by coihprising 
35 (a) compiling a gene expression profile of a patient sample by determining the 

expression level of at least one marker selected from the markers identifiable by 
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. their AfiF/metrix Identification Numbers (affy id) as defined in Tables 1, and/or 
2, and 

(b) classifying tiie gene expression profile by means of a machine learning 
algoritiim. 

Preferably, the reference data bank is backed up and/or contained in a computational 
memory data chip« 
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The invention is further illustrated in the following table and examples, without limiting 
the scope of the invention: 

TABLES 1.1-2.10 

5 ' * . • . ' 

Tables 1.1-2.10 show AML subtype analysis of CBF (core binding factor)-positive AML 
subtypes, preferably AML_t(8;21) and AML_inv(16), from CBF-negative AML subtypes, 
preferably from AML_inv(3X AML_t(15;17), AMLj(llq23)/MLL, and/or 
AML_komplext (complex aberrant karyotype). The analysed markers are ordered 
10 according to their q-values, beginning with the lowest q-values. 

For convenience and a better. understanding. Tables I.l to 2.10 are accompanied with 
explanatory tables (Table I.IA to 2.10A) where the numbering and the AfiEymetrix Id are 
frirlher defined by other parameters, e.g. gene bank accession number 

• .EXAMPLES 

Example 1: General experimental design of the invention and results * 

20 The core binding factor (CBF) subunits CBFa2 and CBFp are frequently involved in acute 
myeloid leukemias. The CBFa2 subum't, also designated AMLl (RUNXl), is affected by 
the translocation t(8;21). The beta subunit is affected by an inversion of chromosome 16 
generating several variants of CBFp-MYHl 1 fiision proteins. CBF oncoproteins have been 
proven excellent markers for cytogenetically based prognostification as well as monitoring 

25 of minimal residual disease. However, little is known about <:ommon CBF targets and their 
relevance for leukemogenic mechanisms. ' Here, we analyzed comprehensive gene 
expression signatures of a representative cohort of AML patients by use of microairays 
(U133set, Affymetrix). First, gene signatures of 50 CBF positive cases, ir=25 samples with 
t(8;21) and iny(16) each, were compared to other balanced chromomsomal aberrations 

30 (iny(3) (n=18), t(15;17) (n=20X t(llq23)/MLL (n=31)), as well as AML with complex 
aberrant karyotypes (n=34). Differentially expressed genes identified from a respective 
• training set consisting of 2/3 of cases were applied to built a Support Vector Machine 
(SVM) model. Subsequently, classification accuracy was assessed in the remaining 1/3 of 
the cases. SVM. subtype stratification accurately predicts all 51/51 . independent test 

35 samples. Thus, CBF leukemias share common gene signatures. Among the top 50 genes 
distinguishing CBF leukemias from other AML subsets three interesting candidates were 
identified The transcription factor CCAAT/enhancer binding protein alpha, encoded by 
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the CEBPA gene, was found to be lower expressed in t(8;21) cases. This confirms the data 
from Pabst et al. demonstrating that AMLl-ETO expression downregulated CEBPA 
mRNA, protein and DNA binding activity. Furthermore, we observed that also in inv(16) 
CBF leukemias CEBPA expression is downregulated. Secondly, Copine VUI was found 

5 downregulated in CBF leukemias. More detailed, Copine Vm expression was calculated as 
absent in t(8;21) and inv(16) samples. Copine VUI has recerrtly been described as novel 
fusion partner of AMLl in an aggressive AML with t(12;21) translocation. AMLl was 
fused out of firame with Copine VIII resulting in an abnormal translational termination of 
Copine VIIL The truncated AMLl protein only contained the pNA-binding but not the 

10 transactivation domain. It has been speculated that disnq>tion of Copine VHI expression 
confers an additional proliferative mutation. Here, our data suggests that CBF leukemias 
do not express Copine VHI at all. Finally, RUNX3 (AML2) was identified to be 
downregulated in CBF leukemias. RUNX3 has been reported to play a fimctional role in 
the nervous system and lack of RUNX3 is causally related to the genesis and progression 

15 of human gastric cancer. According to our data, it can be speculated that RUNX3 
expression is also silenced in CBF leukemias due to hypermethylation of CpG islands in 
the promoter region as demonstrated for mouse carcinoma cell lines. Moreover, lack of 
. . Copine Vm as well as downregulated RUNX3 e:q)ression was also observed when CBF 
leukemias were compared to AML with normal karyotypes (n='lS9) as well as to 51 cases 

20 with unbalanced chromosomal aberrations: trisomy 8 (n=12), trisomy 1 1 (n=7), trisomy 13 
(n=7), mo^osomy 7 (n=9), del(5q) (n=7) and del(9q) (n=9). In conclusion, besides 
previous reported distinct signatures for t(8;21) and inv(16) cases, common expression 
patterns caused by CBF oncoproteins could be identified. Future studies will have to focus 
on those conunon CBF targets and functional assays need to be established proving their 

25 leukemogenic relevance. 



Example 2: General materials^ methods and definitions of functional annotations 

30 The methods . section contains both ' information on statistical analyses used for 
identification of differentially expressed geneis and detailed annotation data of identified 
microarray probesets. 

Affvmetrix Probese t Annntatlnn 

35 All annotation data of GeneChip® arrays are extracted fix>m the NetAffic™ Analysis 
Cmter (internet website: www.afi^etrix.com). Files for U133 set arrays, including 
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U133A and U133B microairays are derived from the June 2003 release. The original 
publication refers to: Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun 
S, Kulp D, Siani-Rose MA. NetAfl&c: Afifymetrix probesets and annotations. Nucleic Acids 
Res. 2003;31(1):82.6. 

5 • ' . .* 

The sequence data are omitted due to tfieir large size, and because they do not change, 
. whereas the annotation data, are updated periodically, for example new information on 
chromomal location and functional annotation of the respective gene products. Sequence 
. 4ata arQ available for download in the NetAffic Download Center (www.afifymetrix.com) 

10 

Data fields: 

In the following section, the content of each field of the data files are described. 
Microarray probesets, for example found to be differentially expressed between different 
types of leukemia samples are further described by additional information. The fields are 
IS of the following types: 

1. GeneChip Array Information 

2. Probe Design Information 

3. Public Domain and Genomic References 

20 

1- GeneChip Array Information 
HG-.U133 ProbeSet^ID: 

HG-U133 ProbeSetJD describes the probe set identifier. Examples are: 200007_at, 
25 20001 l_s_at,2G0012_x_at. 

GeneChip: 

The description of the GraeChip probe array name where the respective probeiset is 
represented. Examples are: Afifymetrix Human Genome Ul 33 A Array or Afifymetrix 
30 Human Genome U133B Array. . . ' 

2. Probe Design Informatipn 

Sequence Type: ; • 

35 The Sequence Type indicates whether the sequence is an Exemplar, Consensus or Control 
sequence. An Exemplar is a single nucleotide sequence taken dlrectiy &om a public 
database. This sequence could be an mRNA or EST. A Consensus sequence, is a 
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nucleotide sequence assembled by Asymetrix, based on one or more sequence taken from 
a public database. 

Transcript ID: 

5 The cluster identification number "with a sub*ciuster identijSer appended 
Sequence Derived From: 

The accession number of. the single sequence, or representative sequence on which the 
probe set is based. Refer to the "Sequence Source" field to determine the database used. 

10 

Sequence ID: 

For Exemplar sequences: Public accession number or GenBank identifier. For Consensus 
sequences: Affymetrix identification number or public accession number. 

15 • Sequence Source: 

The database from "which the sequence used to design this probe set was taken. Examples 
are: GenBank®> RefSeq, UniGene, TIGR (annotations from The Institute for Genomic 
Research). 



20 



25 



3. Public Domain and Genomic References 

Most of the data in this section come from LocusLink. and UniGene databases, and are 
annotations of the reference sequence on which the probe set is modeled. . . , 



Gene Symbol and Title: • . . ! • 

. A gene .symbol and a short title, when, one is available. Such symbols are assigned by 
different organizations for different species. Af^mietrix annotadonal data come from the 
UniGene record. There is no indication which species-specific databank was used, but 
30, some of the possibilities include for example HUGO: The Huntian Genome Organization.' 

M^^Location: 

The map location describes the chromosomal location wheia one is available. 

35 Unigene_Accession: 

UniGene accession nundb^ and cluster type. Cluster type can be "fidl length" or "est", or - 
—"if unknown. . • 
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LocusLink: 

TMs infon]iation represents the Ix>cusLiiik 

5 Full Length Ref. Sequences: 

Indicates the references to multiple sequences in RefSeq. The field contains the ID and 
desc^ption for each entiy, and there can be multiple entries per probeSet 

10 Example 3: Sample preparation, processing and data analysis 
Method 1: 

Microarray analyses were performed utilizing the GeneChip® System (Afifymetrix, Santa 
Clara, USA). Hybridization target preparations were performed according to recommended 

15 protocols (Affymetrix Technical Manual). In detail, at time of diagnosis, mononuclear cells 
were purified by Ficoll-Hypaque density centrifugation. They had been lysed immediately 
in RLT buffer (Qiagen, Hilden, Germany), firozen, and stored at -80**C from 1 week to 38 
. months. For gene eiq^ression profiling cell lysates of the leukemia samples were thawed, 
homogenized (QIA^edder, Qiagen), and total RNA was extracted (RNeasy NGni Kit, 

20 Qiagen). Subsequently, 5-10 \ig total RNA isolated from 1 x 10^ cells was used as starting 
material for cDNA synthesis witii oligo[(dT)i4T7promotor]65 primer (cDNA Synthesis 
System, Roche Ajjplied Science, Maimheim, Germany). cDNA products were purified by 
. phenol/chlorophorm/IAA extraction (Ambion, Austin, USA) and acetate/ethanol- 
precipitated overnight. For detection of the hybridized target nucleic acid biotin-labeled 

25 ribonucleotides were incorporated during the following in vitro transcription reaction 
(Enzo BioArray HighYield RNA Transcript Labeling Kit, Enzo Diagnostics). After 
quantification by spectrophotometric measurements and. 260/280 absorbance values 
assessment for quality control of the purified cElNA (RNeasy Mini Kit, Qiagen), 15 (ig 
cRNA was fragmented by alkaline treatment (200 mM Tris-acetate, pH 8.2/500 mM 

30 potassium acetate/150 mM magnesium acetate) and added to the hybridization cocktail 
siifiBcient for five hybridizations on standard GeneChip microarrays (300 ^1 final volume): 
Washing and staining of the probe arrays was performed according to the recommended 
Fluidics Station protocol (EukGE-WS2v4); Aflfymetrfac Microarray Suite software (version 
5.0.1) extracted fluorescence signal intensities from each feature on die microarrays as 
. 35 detected by confocal laser scanning according to the manu&cturer* s recommendations. 
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* Expression analysis quality assessment parameters included visiual array inspection of the 
scanned image for the presence of image artifacts and correct grid alignment for the 
identification of distinct probe cells as well as both low 3VS\ ratio of housekeeping 
controls (mean: 1.90 for GAPDIQ and high percentage of detection calls (mean: 46.3% 

5 present called genes). The 3' to 5' ratio of GAPDH probesets can be used to assess RNA 
sample and assay quality. Signal values of the 3' probe sets for GAPDH are compared to 
the Signal values of the corresponding 5* probe set. The ratio of the 3' probe set to the 5' 
probe set is generally no more tiian 3,0. A high 3' to 5* ratio may indicate degraded RNA 
or inefficient synthesis of ds cDNA or biotinylated cRNA (GeneChip® Expression 

10 Analysis Technical Manual, www.affymetrix.com). Detection calls are used to determine 
. whether the transcript of a gene is detected (present) or imdetected (absent) and were 
calculated using default parameters of the Microarray Analysis Suite MAS 5.0 software 
package. 

15 Method 2: 

Bone marrow (BM) aspirates are taken at the time of the initial diagnostic biopsy and 
remaining material is immediately lysed in RLT buffer (Qiagen), firozen and stored at -80 
C until preparation for gene egression analysis. For microarray analysis the GeneChip 
System (Asymetrix, Santa Clara^ CA, USA) is used. The targets for GeneChip analysis are 
20 prepared according to the current Expression Analysis. Briefly, frozen lysates of tihie 
leukemia samples are thawed, homogenized (QIAshredder, Qiagen) and . total .RNA 
extracted (RNeasy Mini Kit, Qiagen).Nonnally 10 ug total RNA isolated firom.l x 107 

* cells is used as starting material in the subsequent cDNA-Syntiiesis using 01igo-dT-T7- 
Promoter Primer (cDNA synthesis Kit, Roche Molecular Biochemicals). The cDNA is 

25 purified by phenol-chlorophorm extraction and precipitated with 100% Ethanol over night. 
For detection of the hybridized target nucleic acid biotin-labeled ribonucleotides are 
incorporated during the in vitro transcription reaction (Enzo® BioArray™ HighYield™ 
RNA Transcript Labeling Kit, ENZO). After quantification of the purified cRNA (RNeasy 
Mini Kit, Qiagen), 15 ug are firagmented by alkaline treatment (200 mM Tris-acetate, pH 

30 8.2, 500 mM potassium acetate, 150.niM magnesivun acetate) and added to the 
hybridization cocktail sufQcient for 5 hybridizations on standard GeneChip microarrays. 
Before expression profiling Test3 Probe Arrajrs (Afifymetrix) are chosen for monitoring of 
the integrity of the cRNA Only labeled cRNA-cocktails which, showed a ratio of the 
messured intensity of the 3' to the S* end of the GAPDH gene less than 3.0 are selected for 

.35 subsequent hybridization on HG-U133 probe arrays (Afiymetrix). Washing and staining 
the Probe arrays is .performed as described • (siehe Afgmietrix-Oiiginal-Litmtur 
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(LOCKHART und LIPSHUTZ). The Affymetrix software ^croarray Suite, Version 
4.0.1) extracted fluorescence intensities &om each element on the arrays as d^ected by 
confocal laser scanmng according to the manu&cturers recommendations. 
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Claims 

1 . A method for distinguishing CBF-positive AML subtypes, piefeiably AMLjt(8;21) 
and/or AML_inv(16) from CBF-negative AML subtypes, preferably AMLJuaiv(3), 
AMLJ(15;17), AMLj(llq23)/MLL (AML^MLL), and/or AMLJcomplext, in a 
sample, the method comprising determining the expression level of markers 
selected from the markers identifiable by their Afiymetrix Identification Numbers 
(afiy id) as defined in Tables 1, and/or 2, 

wherein 

a lower expressioii of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table LI having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
nmnbers 1 to SO of Table 1.1 having a positive fc value, 

is indicative for the presence of AML_CBF when AMLjCBF is distinguished 
from all other subtypes, 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to SO of Table L2 having a negative fc value, and/or 

a higher expression of at least one jpolynucleotide defined by at least one of the 
numbers 1 to 50 of Table IJZ having a positive fc value, 

' is indicative for the presence of AML_MLL when AML_MLL is distinguished 
. fix>m all other subtypes, 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 13 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 13 having a positive fc value, 

is indicative for the presence of AML_inv(3) vAien AMLJnv(3) is 
distinguished fix>m all.other subtypes, . 
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and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 1 A having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
5 numbers 1 to SO of Table 1.4 having a positive fc value, 

. is indicative for the presence of AMLJcomplext vsiien AMLJkomplext is 
distinguished from all other subtypes, . 

and/or wherein 

a lower expression of at least ojie polynucleotide defined by at least one of the 
10 numbers 1 to 50 of Table 1 .5 having a nelgative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
. numbers 1 to 50 of Table 1 .5 having a positive fc value, 

is indicative for the presence of AML_t(15;17) when AML_t(15;17) is 
distinguished from all other subtypes, 

15 and/or herein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.1 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.1 having a positive fc value, 

20 • is indicative for the presence of AMLjCBF when AML_CBF is distinguished 

.fit)m AML_MLL, 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.2 having a negative fc value, and/or . . . . • 

'25 a higher expression of at least one polynucleotide defined by at least one of the 

numbers 1 to 50 of Table 2.2 having a positive fc value, 

is indicative fpr tiie presence of AMLjCSF when AMLjCBF is distinguished 
,fix)mAML_inv(3), 

and/or wherein 



a lower expression of at least one polynucleotide defined by. at least one of the 
numbers 1 to 50 of Table 2.3 ha\dng a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.3 having a positive fc value, 

is indicative for the presence of AML_CBF when AML_CBF is distinguished 
from AML_komplext, 

and/or wherein 

a. lower expresdon of at least one polynucleotide defined by at least one of the 
nuihbers .1 to 50 of Table 2.4 having a negative fc value, and/ot 

a higher expression of at least one polynucleotide defined by at least one of the 
mnnbers 1 to 50 of Table 2.4 having a positive fc value, 

is indicative for the presence of AMLjCBF when AML_CBF is distinguished 
firomAML_t(15;17), 

and/or wherein 

a lower expression of at least one polynucleotide defined.by at least one of the 
numbers 1 to 50 of Table 2.5 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined, by at least one of the 
numbers 1 to 50 of Table 2.5 having a positive fc value, 

. . ' is incticative for thje presence of AML_NfLL when AMLJNILL is disting^ 
firoin AML_inv(3), 

* and/or wherein ; 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.6 having a negative fc value, ahd/pr- . 

. . a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.6 having a positive fc value, 

is indicative for the preseiice of AML_MLL when AMLJMLL is cUstinguished 
*fit)nci AMLJeomplext, 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers l.to 50 of Table 2.7 having a negative fc value, and/or ' . 



a higher expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table %1 having a positive fc value, 

is indicative for the presence of AML_MLL when AML^MLL is distinguished 
fromAMLjt(15;17), 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.8 having a ne^tive fc value, and/or . . 

. a higher expression of at least one'polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.8 Imving a positive fc value, 

is indicative fijr the presence of AML_inv(3) when AML_inv(3) is 
' distinguished ftoax AMLJkomplext, 

and/pr Mdierein 

a lower expression of at least one polynucleotide defined by at least one of the 
nvunbers 1 to 50 of Table 2.9 having a negative fc value, and/or 

a higher expression of at least one polynucleotide defined by at least one of the 
' numbers 1 to 50 of Table 2.9 having a positive fc value, 

is indicative for the presence of AML_inv(3) when Al^^ 
distmguished from AMLjl(15;17), 

and/or wherein 

a lower expression of at least one polynucleotide defined by at least one of the 
numbers 1 to 50 of Table 2.10 having a negative fc value, and/or 

a higher exjpression of at least one polynucleotide defined by at least one of the 
niunbers 1 to 50 of Table 2.10 having a positive fc value, . , 

is indicative for the presence of AML_komplext when AMLJcompIext is 
distinguished fi-om AMLJ(1 5; 17). 



2. The method according to claim 1 wherein the polynucleotide is labelled. 

3. The method according to claim 1 or 2, wherein the label is a luminescent, 
preferably a fluorescent label, an enzymatic or a radioactive label. 
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4. The method according at least one of the claims 1-3, wherein the expression level 
of at least two, preferably of at least ten, more preferably of at least 25, most 
preferably of 50 of the markers of at least one of the Tables Ll-2,10 is determined. 

5. The method according to at least one of the claims 1-4, wherein the expression 
level of markers expressed lower in a first subtype than in at least one second 
subtype, which differs from the first subtype, is at least 5 %, 10% or 20%, more 
preferred at least 50% or may even be 75% or 100%, i.e. 2-fold lower, preferably at 

10 . . least 10-fold, more preferably at least 50-fold, and mqst preferably at least 100-fpld 

lovrer in file first subtype. 

6. The method according to at least one of the claims 1-4, wherein the expression 
level of markers expressed higher in a first subtype than in at least one second 

15 subtype, which differs firom the first subtyi>e, is at least 5 %, 10% or 20%, more 

preferred at least 50% or may even be 75% or 100%, i.e. 2-fold hig^her, preferably 
at least 1 0-fold, more preferably at least 50-fold, and most preferably at least 100- 
. . fold higher in the first subtype. 

20 7. The method according to at least one of the claims 1-6, whereiti the saniple is ftom 
an individual having AML. 

8. The method according to at least one of the claims 1-7, wherein at least one . 

polynucleotide is in the form of a transcribed polynucleotide, or a portion thereof. 



25 



9. The method according to claim 8, wherein the transcribed polynucleotide is a 
mKNAoracDNA. 



.6- 

10. The method according to claim 8 or 9, wherein the detomining of flie expression 
level comprises hybridizing the transcribed polynucleotide to a complemmtary 
polynucleotide, or a portion thereof linder stringent hybridization conditions. 

5 11. The method according to at least one of the claims 1-7, wherein at least one 
polynucleotide is in the form of a polypeptide, or a portion thereof. 

12. The mefliod according to at least one of the claims 8, 9 or 12, wherein the 
determining of the expression level comprises contacting the polynucleotide or the 

10 polypeptide with a compound specifically binding to the polynucleotide or the • 

polypeptide. " * . . 

13. The method according to claim 12, wherein the compound is an antibody, or a 
fi:agment thereof . 

15 . . . ' ■ • • . : 

14. The method according to at least one of the claims 1-13, wherein the method is 
carried out on an array. 

15. The method according to at least one of the claims 1-14, wherein flie method is 
20 • . . carried out in a robotics system. . : 

16. The method according to at least one of the claims 1-15, wherein the inetfaod is 
carried out using microfluidics. . . 

25 17. Use of at least one marker as defined in at least one of the claims 1-3 for the 

manufacturing of a diagnostic ifor distinguishmg CBF-positive AML subtypes from 
CBF-negative AML subtypes. 

1 8. The use according to claun 1 7 for distinguishing CBF-positive AML subtypes from 
30 CBF-negative AML subtypes. 
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* 19. ' A diagnostic kit containing at least one marker as defined in at least one of the 
claims 1-3 for distinguishing CBF-positive AML subtypes from CBF-negative 
AML subtypes, in combination with suitable auxiliaries. 

20. . The diagnostic kit according to claim 19» wherein the kit contains a reference for 
the CBF-positive AML subtype and/ or the CBF-negative AML subtype. 

. 21 . The diagnostic kit according to claim 20, wherein the reference is a sample or a 
10 databank. 

22. An apparatus for distinguishing CBF-positive AML subtypes from CBF-negative 
AML subtypes in a sample containing a reference data bank. . 



15 23. The apparatus according to claim 22, wherein the reference data bank is obtainable 
by comprising 

(a) compiling a gene expression profile of a patient sample by determining the 
expression level of at . leaist one marker selected from the niarkers 
identifiable by their Affymetiix Identification Numbers (afify id) as defined 

20 in Tables 1, and/or 2, and 

(b) classifying the gene expression profile by means of a machine learning 
. algorithm. . 

24. The apparatus according to claim 23, wherein the niachine leariiing algorithm 
25 selected fit>m the group consisting of Weighted Voting, K-Nearest Neighbors, 

Decision Tree Induction, Support Vector Machines, and Feed-Forward Neural 
Networks,i>referablySiq>pprt Vector Machines. . • 



25. 

30 



The apparatus according to at least one of the claims 22-24, wherein the apparatus 
contains a control pan,el and/or a monitor. 
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26. A reference data bank for distinguishing CBF-positive AML subtypes from CBF- 
negative AML subtypes obtainable by comprising 

(a) compiling a gene expression profile of a patient sample by determining the 
^pression level of at least one marker selected from the markers 

5 identifiable by their Afiymetrix Identification Numbers (aflEy id) as defined 

in Tables 1, and/or 2, and . 

(b) classifying, the gene expression profile by means of a machine learning 
algorithm. 



10 27. 



The reference data bank according to claim 26, wherein the reference data bank is 
backed up and/or contained in a computational memory chip, " 
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Abstract 

Disclosed is a method for distinguishing CBF-positive AML subtypes ^om CBF-negative 
AML subtypes in a sample by determining the expression level of markers, as well as a 
diagnostic kit and an apparatus containing the markers. 
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